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Cooperation enhanced by the difference between interaction and learning neighborhoods for 

evolutionary spatial prisoner's dilemma games 
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We study an evolutionary prisoner's dilemma game with two layered graphs, where the lower layer is the phys- 
ical infrastructure on which the interactions are taking place and the upper layer represents the connections for 
the strategy adoption (learning) mechanism. This system is investigated by means of Monte Carlo simulations 
and an extended pair-approximation method. We consider the average density of cooperators in the stationary 
state for a fixed interaction graph, while varying the number of edges in the learning graph. According to the 
Monte Carlo simulations, the cooperation is modified substantially in a way resembling a coherence-resonance- 
like behavior when the number of learning edges is increased. This behavior is reproduced by the analytical 
results. 
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I. INTRODUCTION 

Cooperation can be found in many places in the realistic 
world, from biological systems to economic and social sys- 
tems [Ij. An altruistic action, which benefits others at the 
expense of one's own investment, appears to contradict our 
understanding of natural selection controlled by selfish indi- 
vidual behaviors. Thus understanding the conditions for the 
emergence and maintenance of cooperative behavior among 
unrelated and selfish individuals becomes a central issue in 
evolutionary biology [2]. In the investigation of this problem 
the most popular framework is game theory together with its 
extensions involving evolutionary context [3, 4]. The pris- 
oner's dilemma (PD), a two-person game in which the play- 
ers can choose either cooperation (C) or defection (D), is a 
common paradigm for studying the evolution of cooperation 
p, 6]. In the traditional version of the PD game, two inter- 
acting players are offered a certain payoff, the reward R, for 
mutual cooperation and a lower payoff, the punishment P, for 
mutual defection. If one player cooperates while the other de- 
fects, then the cooperator gets the lowest sucker's payoff S, 
while the defector gains the highest payoff, the temptation to 
defect T. Thus we obtain T > R > P > S. It is easy to 
see that defection is the better choice iiTespective of the oppo- 
nent's selection. For this reason, defection is the only evolu- 
tionarily stable strategy in fully mixed populations of C and 
D strategies [3]. 

Since cooperation is abundant and robust in nature, con- 
siderable efforts have been concentrated on exploration of 
the origin and persistence of cooperation. During the last 
decades, five rules, namely, kin selection [Tj, direct reci- 
procity pi], indirect reciprocity |@], network (or spatial) reci- 
procity OSlli^lIrl, and group selection flZ], have been found 
to benefit the evolution of cooperation in biological and eco- 
logical systems as well as within human societies (for a re- 
cent review, see |13] and references therein). In realistic sys- 



procity) can lead to cooperation in the absence of any strategic 
complexity [£, J^ [Til Hi (for a recent review of evolution- 
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tems, most interactions among elements are spatially local- 
ized, which makes spatial or graph models more meaningful. 
Unlike the other four rules, spatial games (i.e., network reci- 
ad to coopera 

ary games on graphs, see ||15[1 ). In spatial evolutionary PD 
games, the cooperators can survive by forming large com- 
pact clusters, which minimize the exploitation by defectors. 
Along the boundary, cooperators can outweigh their losses 
against defectors by gains from interactions within the clus- 
ter H IH IH • 

In spatial models |[l3, [H H [H, [i3l, the players occu- 
pying the vertices of a graph can follow one of the two pure 
strategies (C or D), and collect payoffs from their neighbors 
by playing PD games. Sometimes the players are allowed to 
modify their strategies according to an evolutionary rule de- 
pendent on the local payoff distribution. To describe real sys- 
tems we can introduce two different graphs |13]. The "in- 
teraction graph" determines who plays with whom. The "re- 
placement graph" (or learning graph) determines who com- 
petes with whom for reproduction, which can be genetic or 
cultural. To our knowledge, in most of the existing works the 
interaction and replacement graphs are assumed to be identi- 
cal. The different roles of these graphs raises a natural ques- 
tion: How is cooperation affected when the interaction and re- 
placement graphs are different? Ifti et al. | ll8|] have studied the 
continuous PD game when the interaction neighborhood (IN) 
and learning neighborhood (LN) are different. In the lattice 
topology, it was observed that when the neighborhood sizes 
for "interacting" and "learning" differ by more than 0.5, co- 
operation is not sustainable 1 1811 . Now we wish to study what 
happens if the players can follow only one of the two pure 
strategies and the LN for the individuals is inhomogeneous. 

In this paper, we address these problems by considering an 
evolutionary PD game on two layered graphs. The lower layer 
is the physical infrastructure on which the interactions are tak- 
ing place (interaction layer), and the upper layer represents 
the information flows (learning or imitation layer). For the 
sake of simplicity, we study the case where the lower interac- 
tion layer is a square lattice. Generally, one can expect that 
the size of the LN is larger than that of the IN, which can 



be understood as follows. After each round of the game, not 
only do the interacting players exchange information about 
their own payoffs and strategies, they also share information 
about their neighbors and their neighbors' neighbors. To ex- 
plore the influence of the difference between the interaction 
and learning graphs on the evolution of cooperation, we keep 
the IN fixed and vary the size of the LN. In what follows two 
types of models are systematically investigated. In the first 
case (model I), we simply increase the size of the LN for all 
the players at the same level. In the second case (model II), 
we endow the players with heterogeneous abilities to obtain 
information, i.e., some players have a larger size of LN than 
others. 
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II. MODEL 

We consider the PD game with pure strategies: either C 
or D. On the interaction layer (a square lattice), each player 
plays PD games with its four neighbors and collects a payoff 
determined by the strategy-dependent payoff. The total pay- 
off of a certain player is the sum over all interactions. We 
assume that a cooperator pays a cost c for another individual 
to receive a benefit b {b > c), and a defector pays no cost and 
does not distribute any benefits. Thus the reward for mutual 
cooperation is _R = b — c, the sucker's payoff S — —c, the 
punishment for mutual defection is P = 0, and the temptation 
to defect is T ~ b. Following lUTIl . the payoffs are rescaled 
such that i? = 1, T = 1 + r, S" = -r, and P = 0, where 
r = c/{b — c) denotes the ratio of the costs of cooperation to 
the net benefits of cooperation. 

After each round of the game, the players are allowed to 
inspect their learning neighbors' payoffs and strategies, and, 
according to the comparison, determine which of their strate- 
gies to adopt in the next round. Following previous studies 
H [H O m i^ mi m, the evolution of the present sys- 
tem is governed by the adoption of strategy by a randomly 
chosen player i and one of its learning neighbors j, namely, 
the player i will adopt the learning neighbor's strategy with a 
probability dependent on the payoff difference [Ui — Uj) as 



W 



l + exp[(C/, -C/j)/k] 



(1) 



where k characterizes the noise introduced to permit irrational 
choices, k ~ Q and k -^ oo denote the completely deter- 
ministic and completely random selection of the neighbor's 
strategy, respectively, while for any finite positive values k in- 
corporates the uncertainties in the strategy adoption, i.e., the 
better one's strategy is readily adopted, but there is a small 
probability to select the worse one's. The effect of noise k on 
the stationary density of cooperators in the spatial PD game 
has been studied in detail in Refs. ll20ll22ll . Since this issue 
goes beyond the purpose of the present work, in all our fol- 
lowing studies, we simply fix the value of k to be k = 0.1. 

In both models I and II, the lower interaction graph is a 
square lattice with periodic boundary conditions and of size 
N = 200 X 200. For model I, we denote the size of the LN 
of the players by d, where d = 1,2,... indicate, respectively. 
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FIG. 1: Average density of cooperators, pc, as a function of r for 
different sizes of the LN on square lattices with asynchronous (a) 
and synchronous (b) strategy-updating. Predictions of pc by the pair 
approximation are shown in (c). The cases d = 1, 2, 3, 4 correspond 
to, respectively, the conditions that the learning neighborhoods of 
the players include their nearest neighbors, nearest and next-nearest 
neighbors, and so on, while d — oo means that each player can learn 
from the whole population. 



that each player can learn (or get payoff and strategy infor- 
mation after each round) from their nearest neighbors, nearest 
and next-nearest neighbors, and so on. For model II, the up- 
per learning graph is a scale free network embedded on the 
underlying square lattice, which can be constructed accord- 
ing to the following steps, associated with the lattice embed- 
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FIG. 2: Average density of cooperators, pc, SiS a. function of r on 
square lattices with synchronous strategy-updating, where the learn- 
ing networks are the LESFNs built on the underlying square lattices 
with different decay exponents 7. For the sake of comparison, the 
case that each player can learn from the whole population is also 
shown by solid stars. 



ded scale-free network (LESFN) model 112311 : For each site 
of the underlying interaction graph, a prescribed degree k is 
assigned taken from a scale-free distribution P{k) ~ k '', 
k € [4, A'^). A node (say i, with degree ki) is picked out 
randomly and connected to its closest neighbors until its de- 
gree quota ki is realized or until all sites up to a distance 
r{ki) = mini A\/ki, vN) have been explored, where A is the 
territory parameter [23] (in the present work, we set A = 10). 
Duplicate connections are avoided. This process is repeated 
for all sites of the underlying lattice 112411 . 



III. RESULTS 

First, we study the two models by Monte Carlo (MC) sim- 
ulations started from a random initial distribution of C and D 
strategies. By varying the value of r, both asynchronous and 
synchronous strategy-updating are implemented for model I, 
and only synchronous for model II. The total sampling times 
are 16000 MC steps and up to 24000 for model II when 
7 < 2.0. The stationary state is characterized by the aver- 
age density of cooperators pc calculated by averaging over 
the last 4000 steps when the values of d and 7 are varied sys- 
tematically. All the simulation data shown in Figs. [TJIJl and 
[3] result from an average over either ten realizations of inde- 
pendent initial strategy configurations (for model I) or ten re- 
alizations of the learning graphs (for model II). 

Let us first discuss the MC results obtained for model I. 
The dependence of pc on r in the stationary state for different 
sizes of LN, d, is illustrated in Figs. [TJa) andlHb). For d = I, 
i.e., when the IN and LN are identical, and with asynchronous 
strategy updating, we recover the results of 1 17]: cooperators 
persist at substantial levels if r is sufficiently small [Fig. [n^)]- 
Synchronous strategy updating gives rise to a smaller thresh- 



FIG. 3: Average density of cooperators, pc, as a function of the 
decay exponent 7 of the LESFNs for two special values of r — 0.017 
and 0.023. 



old of Tc, beyond which cooperators vanish [Fig. [Ub)]. It is 
interesting that for d = 2, i.e., besides its nearest neighbors 
a player can also learn from its next-nearest neighbors, both 
asynchronous and synchronous strategy updating lead to qual- 
itatively as well as quantitatively the same stationary density 
of Pc- For even larger sizes of d = 3 and 4, though the qual- 
itative behaviors are similar, their quantitative properties are 
distinct [somewhat greater values of the threshold Tc in Fig. 
[Tib) for synchronous dynamics]. In particular, for d -^ 00, 
which coiTesponds to the case that each player can learn from 
the whole population, cooperators cannot persist in the sys- 
tem for any finite positive values of r when updating asyn- 
chronously, whereas they can maintain at considerable levels 
if r is very small when updating synchronously. 

In addition to the above points, it is worth pointing out that, 
when the LN is larger than the IN, e.g., d — 2,3, and 4, there 
arise two absorbing states (all C and all D, respectively) sep- 
arated by an active state (coexistence of C and D) over the 
range of r, i.e., cooperators can "wipe out" defectors or dom- 
inate in the system if the players are allowed to get payoff 
and strategy information from neighbors further away than 
just interacting neighbors only. This is to say, cooperation is 
promoted due to the difference between the IN and LN. Fig- 
ures [TJa) and [Hb) illustrate clearly the remarkable enhance- 
ment appearing in the case of d = 2 for both synchronous and 
asynchronous dynamics. For synchronous strategy updating, 
however, as long as the size of the LN is larger than that of 
the IN, the cooperative behavior is always enhanced to some 
extent as compared to the case of d — 1. For asynchronous 
strategy updating, however, the promotion of cooperation is 
only realized in a very small range of r when d > 2, and this 
range decreases with increasing d and vanishes in the limit of 
d — » 00. 

The mean-field ap prox imation predicts pc ~ for any val- 
ues of < r < 1 lllSl [ITI I20I1 . The nonzero values of pc 
(dependent on r) cannot be described by the mean-field ap- 
proach. To characterize the evolution of pc, the more sophis- 
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FIG. 4: Typical time series of the ratio of average learning degree 
of cooperators and defectors, {Kc)/{Kd}, for two special values of 
r = 0.012 and 0.028 in the stationary state. The upper LESFN has 
a decay exponent 7 = 2.0. 



ticated pair approximation provides an analytically accessible 
way to determine the corrections from spatial structural cor- 
relation of the players. Instead of the equilibrium density of 
C and D, the pair approximation considers the frequency of 
strategy pairs C-D (see the Appendix of Refs. IUSLIitII or the 
Supplementary Information of Ref. 1 16] for details). How- 
ever, these existing methods are prepared for the condition 
where the interaction and learning graphs are identical. When 
these two graphs are different, we should make some modifi- 
cations to the original approach. 

In the present work, we use an extended pair-approximation 
(see the Appendix) to calculate the density pc by varying the 
values of r and d for model I. The results are shown in Fig. 
[TJc). The extended pair-approximation correctly predicts the 
tendencies of the evolution of pc, especially for d = 2, but 
significantly underestimates the benefits of spatial structural 
effects and the larger size of the LN (than the IN) at low r, 
whereas it overestimates those at high r. Despite this point, it 
verifies the above result obtained by MC simulation, i.e., the 
most remarkable enhancement of cooperation takes place at 
d = 2. For d = 3, it fits the synchronous results better than 
the asynchronous ones [according to the magnitude relation- 
ship between the curve for d — 3 and that for d = 1 in Fig. 
[Ua)-Fig. [TJc)], despite the fact that the pair-approximation 
is based on the assumption of continuous time, and hence on 
asynchronous updating. In particular, for d = 4, it correctly 
predicts the occurrence of an intersection with the curve for 
d = 1 as previously found by MC simulation in Fig. [TJa)- 
For d ^- 00, it once again correctly forecasts the extinction of 
cooperators (pc = 1/2 for r = 0, and pc = for other any 
finite positive values of r). 

We now focus our attention on the influence of the hetero- 
geneous LN on the evolution of cooperation. The MC results 
obtained for model II with different values of 7 are summa- 
rized in Fig. |2] The case 7 -^ 00 is equivalent to the case 



d — 1 studied in model I. With decreasing 7 (yielding an in- 
crease in the average degree of the LN), the cooperative level 
increases gradually until 7 w 1.7 ± 0.2, where pc reaches its 
maximum, and then it gradually decreases as 7 goes to zero. 
In finite size systems for vanishing 7 in model II, the evolu- 
tionary results are expected to tend toward the (unattainable) 
case of d — > 00 in model I, since on average the players have 
more and more learning neighbors. For the sake of clarity, in 
Fig. |2]we also show the result for d — > 00 obtained in model I 
by solid stars. 

Note that, just as was found in model I, if the LN and IN 
are different, then cooperation is promoted, and the maximum 
enhancement is achieved at a moderate level of the available 
information of the LN. Too little information as well as too 
much information favors defection. To support this point, we 
have also studied the density of cooperators as a function of 
the size of the LN of the players (characterized by 7) for two 
special values of r = 0.017 and 0.023. The MC results are 
plotted in Fig. [3] We can clearly observe that there exactly 
arise peak values of pc in the middle range of 7 for both r, 
analogously to the so-called coherence resonance |25]. In re- 
cent research work, many mechanisms are described that can 
lead to this coherence resonance phenomenon in studying the 
PD game. For example: in Ref. 112511 additive noise on the 
classical replicator dynamics can enhance the average payoff 
of the population in a resonance-like manner; By introduc- 
ing random disorder in the payoff matrix, Perc [26] found 
a resonance-like behavior of the density of the cooperators 
which reaches its maximum at an intermediate disorder On 
static complex networks. Tang et al. [27] obtained the result 
that maximum cooperation level occurs at intermediate aver- 
age degree. Ren et al. ]28] studied the effects of both topo- 
logical randomness in individual relationships and dynamical 
randomness in decision making on the evolution of coopera- 
tion, and found that there exists an optimal moderate level of 
randomness, which can induce the highest level of coopera- 
tion. Our result presented here, i.e., enhancing cooperation by 
increase of the LN, which resembles a coherence-resonance- 
like behavior, provides a different example of this dynamical 
phenomenon. It will enrich our knowledge of the evolution of 
cooperation in nature. 

More recently, Ohtsuki et al. 112911 studied the evolution of 
cooperation in the evolutionary spatial PD game, wherein the 
interaction graph and replacement (or learning) graph are sep- 
arated. They considered three different update rules for evolu- 
tionary dynamics: birth-death, death-birth, and imitation ll29ll . 
By both analytical treatment and computer simulations, they 
found that under death-birth and imitation updating, the opti- 
mum population structure for cooperators is given by max- 
imum overlap between the interaction and the replacement 
graph, i.e., whenever the two graphs are identical ]29]. Any 
existing difference between these two graphs will benefit de- 
fectors. This result holds for weak-selection (which means 
that the payoffs obtained by the individuals from the game 
have a slight contribution to their fitness) and large population 
size. The "imitation" updating in 112911 is implemented as fol- 
lows. A random individual is chosen to update its strategy; 
it will either stay with its own strategy or imitate one of the 



neighbors' strategies proportional to its fitness. In fact, from 
this point of view, the update mechanism (or evolutionary dy- 
namics) of our model, Eq. ([T]i, can also be regarded as imi- 
tation, where the fitness of each individual is determined by 
an exponential function of its payoff obtained from the game, 
gC^/K (Whenever updating the state of the population, one by 
one the focal individual and a randomly chosen neighbor from 
its LN compete for reproduction proportional to their fitness 
according to this function.) However, we obtain remarkably 
different result as compared to |29], i.e., in our model, the dif- 
ference between the IN and LN can favor essentially cooper- 
ators over defectors (especially for the case of synchronous 
updating). Since the evolutionary outcomes are dependent 
on the updating rules, and there are many possible updating 
dynamics on graphs, we think the detailed evolutionary rules 
give rise to this different result in contrast to that of |29]. In 
addition, in the present model the fitness of the individuals is 
closely related to their payoff, which can be regarded as strong 
selection, while the result in |29] is obtained in the limit of 
weak selection. Thus our present results enrich our knowl- 
edge of the evolution of cooperation in the PD game when the 
IN and LN are separated. 



IV. CONCLUSIONS 



In summary, we have explored the influence of the differ- 
ence between interaction and learning neighborhoods on the 
evolution of cooperation. This is done by studying an evolu- 
tionary spatial PD game wherein the interaction and learning 
graphs of the players are different. The players are placed on 
two layered graphs, where the lower layer is the physical in- 
frastructure on which the interactions are taking place and the 
upper layer represents the skeleton where the payoff and strat- 
egy information flow. For the sake of simplicity, we keep the 
interaction graph fixed and vary the size of the neighborhood 
in the learning graph. Two types of models have been sys- 
tematically studied: In model I, we simply increase the size of 
interaction neighborhood for all the players at the same level; 
and in model II, we endow the players with heterogeneous 
ability to obtain information. We performed MC simulations 
for both models. For model I, we also use an extended pair 
approximation to evaluate the average density of cooperators, 
PC, and make a comparison with the corresponding results 
that follow from our MC simulations. 



Since in model II the players possess an inhomogeneous 
LN, we would like to investigate the effect of this heterogene- 
ity on the players' strategy selection. In Fig. |4] we display 
the typical stationary-state time series of the ratio of the aver- 
age learning degrees of cooperators and defectors (calculated 
by the total number of neighbors learning of a certain strat- 
egy divided by the total number of the players adopting this 
strategy), {Kc)/{Kd), for two special values of r = 0.012 
and 0.028 (cooperators and defectors dominate in the two 
cases, respectively). The upper LESFN has a decay exponent 
7 — 2.0. We can observe that in the stationary state the av- 
erage learning degree of the cooperators is always larger than 
that of the defectors, which indicates that, the more learning 
channels the players possess, the greater is the probability they 
would cooperate with others. 



Finally, we want to point out the difference between our 
results and those of Ref. flS'], in which Ifti et al. studied 
the case where the IN and LN are different in the continu- 
ous PD game, and observed that in the lattice topology, when 
the neighborhood sizes for interacting and learning differ by 
more than 0.5, cooperation cannot persist in the population. 
This is not the case for the present studied models wherein the 
players are pure strategists. Cooperation can be maintained 
at considerable levels in the cases where the size of the LN 
is far larger than that of the IN [see Figs. [Tta) andlTJb) for 
d = 3, 4 and Fig. |2l, and can go so far as to wipe out de- 
fectors for sufficient small r (homogeneous state C in Figs. 
[T] and |2]i. In particular, as long as the strategy updating is 
implemented synchronously, cooperation is always promoted 
essentially when the IN and LN are different (no matter how 
large the LN is) as compared when they are identical. 



The main result is that, a difference between the interac- 
tion and learning graphs can promote cooperation substan- 
tially. The results of this mechanism resemble a coherence- 
resonance-like behavior For model I the maximum enhance- 
ment is achieved ai d — 2, i.e., when the players, in addi- 
tion to their nearest neighbors, can also learn from their next- 
nearest neighbors; for model II, it is realized at the middle 
level of the available information of learning neighbors. Too 
little learning information favors defection, but apparently so 
does too much information (especially for asynchronous strat- 
egy updating). However, as long as the strategy updating is 
implemented synchronously, cooperation is always promoted 
essentially when choosing a larger size of neighborhood in the 
learning graph. This point is also verified by the extended ver- 
sion of the pair-approximation method. In model II, where 
the players possess heterogeneous learning neighborhoods, 
we found that the more learning neighbors a player has, the 
greater the probability it will cooperate with others. There are 
few existing works studying the evolutionary PD game on net- 
works with distinct interaction and learning neighborhoods. 
Thus our present results provide a further perspective on un- 
derstanding the emergence and persistence of cooperation in 
realistic systems. 

In future work, a concise explanation of the mechanism 
supporting cooperation should be revealed by more sophis- 
ticated analytical methods. Furthermore, it would be inter- 
esting to allow the interaction neighborhood and/or the learn- 
ing neighborhood to be mutable during the process of the dy- 
namics (just as has been done in the case of the continuous 
prisoner's dilemma game PIS']), i.e., to study the effects of 
annealed and quenched randomness in the interaction and/or 
learning partnership for fixed number of coplayers Il20ll . Work 
along these lines is in progress. 
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APPENDIX: EXTENDED PAIR-APPROXIMATION 
METHOD 

When the players are pure strategists, an analytical approx- 
imation of the spatial dynamics can be obtained using the pair 
approximation (a detailed survey of this technique is given in 
the Appendix of the recent review paper flSll . and somewhat 
brief yet clear versions can be found in the Supplementary In- 
formation of 1 16] and also in the Appendix of 1 17]). Instead of 
considering the density of strategies as in well-mixed popula- 
tions, i.e., in mean-field theory, pair approximation tracks the 
densities of strategy pairs. For the present studied evolution- 
ary PD games, that is to say, we will first address the prob- 
abilities, pc,c, Pc,d, of finding an individual playing strategy 
C accompanied by a neighbor playing C or D, respectively. 
Then the density of C is given by pc = Pcc + P cd- For more 
details, we refer the readers to Refs. Ill5l[l6l[l7ll . Here we just 
make extensions to the approach to study model I, where the 
interaction and learning graphs are different. As an example, 
we will consider the case of d = 4 (extensions to other cases 
are straightforward). 

For d = A, each player interacts with its four nearest neigh- 
bors on a square lattice, but can learn from those neighbors 
with longer (Euclidean) distance up to 4. Since these learn- 
ing neighbors satisfy the condition of rotation symmetry, we 
will consider only those neighbors falling in the first quadrant 



[see Fig. |5|. Whenever a randomly chosen site A updates 
its strategy, a neighbor B is randomly selected from its learn- 
ing neighborhood as a reference. Their common neighbors 
(if any) as well as their respective neighbors are considered 
to be independent by the pair approximation. Thus, when the 
selected reference is its nearest (next-nearest) neighbor, we 
will refer to the configuration Fig. |5fb) (Fig. |3c)); and for 
other cases we will refer to the configuration Fig. |3d). As- 
suming the selected learning neighbor B is A's third neighbor 
(next-next-nearest neighbor), then we will use the scheme Fig. 
|5ld) to calculate changes in the pair configuration probabili- 
ties p^^i -^ PB.i- 

The payoffs Pa and Pb of A and B are determined by 
accumulating the payoffs in interactions with their neighbors 
X, y, z, i and u, v, w, j, respectively. The pair approximation 
is completed by determining the evolution of the pair con- 
figuration probabilities, i.e., the probability that the pair pA,i 
becomes psi : 



= ET.T.fiPB-PA) 



PAA^B.i 

xyz uvw ij 
Px,APy,APz.,APA,iPi,]Pj,BPu,BPv,BPw,B 

PaPbP^Pj 



(A.l) 



where the transition probability /{Pb — ^A)[see Eq. ^] 
is multiplied by the configuration probability and summed 
over all possible configurations. If B succeeds in tak- 
ing over site A, the following pair configuration probabili- 
ties increase: Px,B,Py,B,Pz,B,PB,i, while the probabilities 
Px,A,Py,A,Pz,A,PA,i decrease. It is easy to analyze the other 
cases of B (i.e., not the third neighbor of A), which lead to 
only a slightly different form of Eq. ( lA.lb . All these changes 
result in a set of ordinary differential equations: 
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FIG. 5: (Color online) Illustration of the lower interaction graph (a square lattice) and the central site (fully-filled square) has a learning 
neighborhood of size d = 4 [only shown those neighbors falling in the first quadrant] (a), and the corresponding schemes used for the pair 
approximation with involved sites A, x, y, z, i, j, u, v, w, and B [(b)-(d)]. These schemes are used to determine changes in the pair 
configuration probabilities pa,b — > Pb,b (b), pA,i — > PB,i (c) and (d). 
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r 



TABLE I: The probabilities of selecting the nearest neighbors as ref- 
erences, hi, the next-nearest neighbors, /i2, and the remaining cases, 
hs, for different sizes of the learning neighborhood. 





hi 


h2 


hs 


d= 1 


1 








d=2 


1/3 


2/3 





d=3 


1/6 


1/3 


1/2 


d = r 


1/10 


1/5 


7/10 


d -^ oo 


«0 


«0 


« 1 



"As an example, the values of hi, h2, kg, for d ■■ 
out by using the symbolized sites in Fig. |5ja). 



: 4 can be easily counted 



where nc{x,y,z) is the number of cooperators among the 
neighbors x, y, z, and Pc{x, y, z) and Pd{x, y, z) specify the 
payoffs of a cooperator (defector) interacting with the neigh- 
bors x^y, z plus a defector (cooperator). hi, h2, and h^ de- 
note the probabiUties of selecting the first, second, and > 
third next neighbors as references, respectively (see Table HI. 
For simplicity, the above two equations omit the common fac- 
tor 2pc^d/iPcPd) which is inessential iflTll . In combination 
with the symmetry condition pc,d = Pd,c and the constraint 
Pc.c + Pc.d + pd, c + pd,d = 1, the above equations can be 
treated either by numerical integration or by setting Pcc — 
Pc,d = and solving for pc.c and pc.d- Then the equilibrium 
density of cooperators is obtained from pc = Pc,c + Pc,d- 
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